Celestin Apprentice 5

home *** CD-ROM | disk | FTP | other *** search

/ Celestin Apprentice 5 / Apprentice-Release5.iso / Source Code / Libraries / Advanced I⁄O v2.3 / Advanced i⁄o / README < prev next >

Wrap

Text File | 1995-06-19 | 8.3 KB | 224 lines | [TEXT/ttxt]

Service C++ functions and classes dealing mostly with "advanced" i/o and the arithmetic compression ***** For the version history, read on ***** Verification files: vendian_io, vhistogram, varithm Don't forget to compile and run them, see comments in the Makefile for details. The verification code checks to see that all the functions in this package have compiled and run well. The code also can serve as an example how the package classes/functions can be used ***** Highlights and idioms ---- Extended file names The package adds support for "extended" file names with pipes in them. That is, the name of a file to open may be specified now as "| command" or "command |" i.e. as a pipe. For example, EndianIn istream; istream.open("gunzip < /tmp/aa.gz |"); EndianOut stream("| compress > /tmp/aa.Z"); image.write_pgm("| xv -"); The <command> is launched in a subprocess through '/bin/sh' with its standard input/output hooked, through pipe(), to the file being opened. This extension is implemented on the lowest possible level, right before the request to open a file goes to OS (through the system call open(2)). A function sys_open() (in the source file sys_open.cc) acts as a "patch": that is, if you call sys_open() instead of open() to open a file, you get all the open() functionality plus the extended file names. Thus, some libg++ 2.6.2 iostream functions were modified to call sys_open() instead of open(). If one wants to use the extended file names outside gcc/libg++, he needs to do open->sys_open substitution himself. ---- Explicit Endian I/O of short/long integers EndianOut stream("/tmp/aa"); stream.set_littlendian(); stream.write_long(1); That means, 1 would be written as a long integer with the least significant byte first, NO MATTER which computer (computer architecture) the code is running on. Using explicit endian specification (like above) is the only way to ensure portability of binary files containing arithmetic data. ---- Stream sharing EndianIn/Out streams can share the same i/o buffer. This is useful when one needs to read/write a "stratified" (layered) file consisting of various variable-bit encoded data interspersed with headers. For example, a file may begin with a header (telling the total number of data items, normalization factors) followed by some variable-bit encoding of items, followed by another header, followed by an arithmetic compressed stream of data, etc. Thus, a file can be like a waffle pie, made of many layers: each of them being interpreted using different streams, each of them collectively sharing the same file and the same file pointer. The situation is similar to sharing an open file (and a file pointer) among parent and child (forked) processes. Note that merely opening a stream on a dup()-ed file handle, or sync()-ing the stream doesn't cut it entirely. See endian_io.cc for more discussion. The bottom line is, this package implements stream sharing in a safe and portable way: it works on a Mac just as well as on different flavors of UNIX. ---- Simple variable-length coding of short integers The code is intended for writing a collection of short integers where many of them are rather small in value; still, big values can crop up at times, so we can't limit the size of the code to anything less than 16 bits. The code is a variation of a start-stop code described in Appendix A, "Variable-length representations of the integers" of the "Text Compression" book by T.Bell, J.Cleary and I.Witten, p.290-295. The present code features support for both negative and positive numbers and an optimization based on the fact that all numbers are no larger than 2^15-1 in abs value, and an assumption that most of them are smaller than 512 (in absolute value). ---- Arithmetic compression of a stream of integers The present package provides a clean C++ implementation of Bell, Cleary and Witten's arithmetic compression code, with a clear separation between a model and the coder. ArithmCodingIn / ArithmCodingOut act as i/o streams that encode signed short integers you put() to, and decode them when you get() them. The ArithmCodingIn/Out object needs a "plug-in" of a class Input_Data_Model when the stream is created. The Input_Data_Model object is responsible for providing the codec with the probabilities (frequencies) a given data item is expected to appear with, and for finding a symbol given its cumulative frequency. Input_Data_Model may also modify itself to account for a new symbol. Thus, the ArithmCoding class is a sort of the 'iostream' class that writes/reads data items to/from the stream performing encoding/decoding. It relies upon the Input_Data_Model for the probabilities needed to perform the arithmetic coding. The current version of the package provides two Input_Data_Model plug-ins, both performing adaptive "modeling" of a stream of integers. The first plug-in uses a simple 0-order adaptive prediction (like the model given in the Witten's book). The other one takes a histogram to sketch the initial distribution, and is a bit sophisticated in updating the model. It is used in compressing a wavelet decomposition of an image. The code below (taken literally from varithm.cc) demonstrates how the coder classes are actually used. The first example writes two different streams (of different patterns, that's why it was better to encode them separately) into the same file EndianOut stream("/tmp/aa"); stream.set_littlendian(); const int sample_header = 12345; { AdaptiveModel model(-1,4); ArithmCodingOut ac(model); ac.open(stream); for(i=0; i<sizeof(pattern1)/sizeof(pattern1[0]); i++) ac.put(pattern1[i]); } { stream.write_long(sample_header); // write a "header" AdaptiveModel model(-1,4); // followed by the arithmetic coded ArithmCodingOut ac(model); // stream ac.open(stream); for(i=0; i<sizeof(pattern2)/sizeof(pattern2[0]); i++) ac.put(pattern2[i]); } stream.close(); The reading is similar. The second example uses a different model plug-in, yet i/o is similar static void test_adh(void) { message("\nCreating Histogram ...\n"); Histogram histogram(-7,7); register int i; for(i=0; i<MyPattern_size; i++) histogram.put(MyPattern[i]); message("\nWriting data ..."); AdaptiveHistModel model(histogram); ArithmCodingOut ac(model); ac.open("/tmp/aa"); for(i=0; i<MyPattern_size; i++) ac.put(MyPattern[i]); ac.close(); message("\nCoded file /tmp/aa has been created\n"); AdaptiveHistModel i_model; ArithmCodingIn ac1(i_model); ac1.open("/tmp/aa"); for(i=0; i<MyPattern_size; i++) { register int val_read = ac1.get(); if( val_read != MyPattern[i] ) _error("Read value %d of the %d-th integer is not what it is " "supposed to be, %d", val_read, i, MyPattern[i]); } ac1.get(); assert( ac1.is_eof() ); } ---- Convenience Functions The package defines a few functions I found convenient to use, like message(...) (which is equivalent to fprintf(stderr,....)) and _error(...) ( the same as message(...), abort();). One doesn't need to to #include <stdio.h> to use them. ***** Grand plans ***** Revision history Version 2.3 - Jun 1995 Fixed the last remaining incompatibility glitches. Now, exactly the same code compiles on a Mac with CodeWarrior 6 and on Unix with gcc 2.6.3 Version 2.2 - May 1995 Added a variable-length (start/stop) coding of signed short integers. Added dealing with simple histograms of an integer-valued distribution. Version 2.1 - Mar 1995 Introducing bool where appropriate (instead of int) and adding checks to make sure an EndianIn/Out stream was opened successfully. Version 2.0 - Feb 1995 Big change: splitting EndianIO into EndianIn and EndianOut and removing all libg++-specific things; everything should be very portable now. Making sharing of the streambuffer portable. Version 1.4 - Feb 1994 Updated for libg++ 2.5.3 Version 1.3 - Aug 1993 Introducing attachment of one stream to another, or sharing of a streambuf among several streams. Took care of properly terminating an arithm coding stream by writing a few phony bits at the end (so we won't hit the EOF on reading). Thus it is possible now to concatenate arithmetic coding streams. Version 1.2 - Jun 1992 Updated to compile under gcc/g++ 2.2.1 and work with libg++ 2.0. The first implementation of the arithmetic coding package Version 1.1 - Nov 1991 - May 1992 Initial revision